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/«f-\ Abstract. Recent advances suggest that a wide range of computer vision prob- 

^_^ lems can be addressed more appropriately by considering non-Euclidean geome- 

f^ try. This paper tackles the problem of sparse coding and dictionary learning in the 

^S) space of symmetric positive definite matrices, which form a Riemannian mani- 

l_i fold. With the aid of the recently introduced Stein kernel (related to a symmetric 

^ I version of Bregman matrix divergence), we propose to perform sparse coding by 

■^T embedding Riemannian manifolds into reproducing kernel Hilbert spaces. This 

leads to a convex and kernel version of the Lasso problem, which can be solved 
efficiently. We furthermore propose an algorithm for learning a Riemannian dic- 
tionary (used for sparse coding), closely tied to the Stein kernel. Experiments 
on several classification tasks (face recognition, texture classification, person re- 
r ^ identification) show that the proposed sparse coding approach achieves notable 

improvements in discrimination accuracy, in comparison to state-of-the-art meth- 
ods such as tensor sparse coding, Riemannian locality preserving projection, and 
symmetry-driven accumulation of local features. 
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7—i 1 Introduction 

> 

^T Sparse representation (SR), the linear decomposition of a signal using a few atoms of a 

2jL dictionary, has led to notable results for various image processing and computer vision 

,_!. tasks [1,2]. While significant steps have been taken towards expanding the theory of 

SR, such representations in non-Euclidean spaces have received comparatively little 

attention. This paper tackles the problem of sparse coding within the space of symmetric 
^^ positive definite (SPD) matrices. 

T-H SPD matrices are fundamental building blocks in computer vision and machine 

L| learning. A notable example is the covariance descriptor [3], which offer a compact way 

• ^H of describing regions/cuboids in images/videos and fusion of multiple features. Covari- 

^% ance descriptors have been exploited in several applications, such as diffusion tensor 

imaging [4], action recognition [5,6,7], pedestrian detection [3], face recognition [8,9], 

texture classification [9,10], and tracking [11]. 

SPD matrices form a cone of zero curvature and can be analysed using the geometry 

of Euclidean space. However, several studies have shown that a Riemannian structure 
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of negative curvature is more suitable for analysing SPD matrices [4,12]. More specifi- 
cally, Pennec et al. [4] introduced the Affine Invariant Riemannian Metric (AIRM) and 
showed that the induced Riemannian structure is invariant to inversion and similarity 
transforms. The AIRM is perhaps the most widely used similarity measure for SPD 
matrices. Nevertheless, efficiently and accurately handling the Riemannian structure is 
non-trivial as basic computations on Riemannian manifolds (such as similarities and 
distances) involve non-linear operators. This not only hinders the development of opti- 
misation algorithms but also incurs a significant numerical burden. 

To address the above drawbacks, in this paper we propose to perform the sparse 
coding of SPD matrices by embedding Riemannian manifolds into reproducing kernel 
Hilbert spaces (RKHS) [13]. This is in contrast to directly embedding into Euclidean 
spaces [7,6,14]. 

Related Work. Sra et al. [14] used the cone of SPD matrices and the Frobenius 
norm as a measure of similarity between SPD matrices. While this results in a regu- 
larised non-negative least-squares approach, it does not consider the Riemannian ge- 
ometry induced by AIRM. 

Guo et al. [6] and Yuan et al. [7] separately proposed to solve sparse representation 
by a log-Euclidean approach, where a Riemannian problem is converted to an Euclidean 
one by embedding manifolds into tangent spaces. While log-Euclidean approaches ben- 
efit from simplicity, the true geometry of the manifold is not taken into account. More 
specifically, on a tangent space only distances to the pole of space are true geodesic 
distances. As such, the pairwise distances between arbitrary points on the tangent space 
do not represent the structure of the manifold. 

Sivalingam et al. [9] used Burg divergence [15] as a metric and reformulated the 
Riemannian' SR problem as a determinant maximisation problem. This has the ad- 
vantage of avoiding the explicit manifold embedding, as well as resulting in a convex 
MAXDET problem [16] that can be solved by interior point methods. However, there 
are two downsides: the solution is computationally very expensive, and the relations 
between Burg divergence and the geometry of Riemannian manifolds were not well 
established. 

Contributions. With the aid of the recently introduced Stein kernel [17], which is 
related to AIRM via a tight bound, we propose a Riemannian sparse solver by embed- 
ding Riemannian manifolds into RKHS. We show that the embedding leads to a convex 
and kernelised version of the Lasso problem [1], which can be solved efficiently. We 
furthermore propose a sparsity-maximising algorithm for dictionary learning within the 
space of SPD matrices, closely tied to the Stein kernel. Lastly, we show that the pro- 
posed sparse coding approach obtains superior performance on several visual classifica- 
tion tasks (face recognition, texture classification, person re-identification), in compar- 
ison to several state-of-the-art methods: tensor sparse coding [9], log-Euclidean sparse 
representation [6,7], Gabor feature based sparse representation [18], and Riemannian 
locality preserving projection [10]. 

We continue the paper as follows. Section 2 begins with an overview of Bregman 
divergence and the Stein kernel. Section 3 describes the proposed kernel solution of 
Riemannian sparse coding, followed by Section 4, which covers the problem of dictio- 



' We loosely use 'Riemannian' to refer to the Riemannian manifold formed by SPD matrices. 



nary learning on Riemannian manifolds. In Section 5 we compare the performance of 
the proposed method with previous approaches on several visual classification tasks. 
The main findings and possible future directions are summarised in Section 6. 

2 Background 

In this section we first overview the properties of Bregman matrix divergences, includ- 
ing a special case known as the symmetric Stein divergence. This leads to the Stein 
kernel, which can be used to embed Riemannian manifolds into RKHS. 

2.1 Bregman Matrix Divergences 

The Bregman matrix divergence for two symmetric matrices X and Y is defined as [15]: 

Dc{X,Y) ^ CiX) - ({Y) - (Vc(Y),X - Y) (1) 

where {A, B)=Tt [A^B) and C is a real valued, strictly convex function on symmetric 
matrices. Bregman divergences are non-negative, definite, and in general asymmetric. 
Among the several ways to symmetrise them, the Jensen-Shannon symmetrisation is 
often used [15]: 

Dfix, Y) ^ \d, (x, ^) + \n, {y, ^) m 

If (^ = — log (det (X)), then the symmetric Stein divergence is obtained from (2): 

S{X, Y) ^ log Let n^±^\\ - 1 log (det [XY)) , fox X ,Y ^ Q (3) 

The space induced by AIRM on symmetric positive definite matrices of dimen- 
sion d is a Riemannian manifold Sym% of negative curvature. For two points X,Y £ 
Sym^, the AIRM is defined as d^ ^ \\\o^^ {Y)fx = Tr jlog^ ( X'^Y X-^\\, where, 

logx {Y) = X2 log ix~2YX~2 ) Xs . The symmetric Stein divergence and Rieman- 
nian metric over Sym'^ manifolds are related in several aspects. Two important prop- 
erties are summarised below. 

Property 1. Let X,Y e Sym%, and St{X, Y) = maxi<i<d{| logyl {XY'^) \} be the 
Thompson metric [19] with A [XY~^) representing the vector of eigenvalues of XY^^. 
The following sandwiching inequality between the symmetric Stein divergence and Rie- 
mannian metric exists [17]: 



-dl{X,Y) < - 



S{X,Y)< -4{X,Y)< -STiX,Y){SiX,Y) + dlogd) (4) 



Property 2. The curve 7(p) = X 2 (x ^YX 2) X 2 parameterises the unique geodesic 
between the SPD matrices X and Y. On this curve the Riemannian geodesic distance 

satisfies dg{X,'y{p)) = pdg{X,Y);p e [0,1] [12]. The symmetric Stein divergence 
satisfies a similar but slightly weaker result, ^(X, 7(p)) < ^^(X, Y). 



The first property establishes a bound between the geodesic distance and Stein 
divergence, providing motivation for addressing Riemannian problems via the diver- 
gence. The second property reinforces the motivation by explaining that the behaviour 
of Stein divergences along geodesic curves is similar to true Riemannian geometry. 

2.2 Stein Kernel 

Definition 1. Let fi — {Xi,X2,- ■ ■ ,Xm} be a non-empty set on Riemannian manifold 
Sym'l. A function (p : H x SI -^ R+ is a Riemannian kernel if ip is symmetric for all 
X,Y 1^ n, ie., ip{X,Y) = ip(Y,X), and the following inequality is satisfied for all 

ai e R: 

y^ aiaj(p{Xi,Xj) > 

^ — 'ij 

Under a mild condition (explained afterwards), the following function forms a Rieman- 
nian kernel [17]; 



k(X Y) - e--^^^^-^' - 2'- \/det(X)-det(r)-^ 

k(X,Y)-e -2 det(X + Y)- ^^' 

We shall refer to this kernel as the Stein kernel from here on. The following theorem 
states the condition under which Stein kernel is positive definite. 

Tlieorem 1. Let SI — {Xi, X2, • ■ • , Xn}; Xi g Sym'l be a set of Riemannian points. 
The N X N matrix K^ = \ka{i,j)];l < i,j < N, with ka{i,j) — k(Xi,Xj), defined 
in (5), is positive definite iff: 

ae{l,l--,'^]u{reR:r>lid^l)] (6) 

Interested readers can follow the proof in [17]. For values of a outside of the above 
set, it is possible to convert a pseudo kernel into a true kernel, as discussed for exam- 
ple in [20]. The determinant of an d x d SPD matrix can be efficiently computed by 
Cholesky decomposition in O {d'^)- As such, the complexity of computing Stein kernel 
isO(3d3 + 3CT). 

3 Kernel Sparse Coding 

Sparse coding on Riemannian manifolds in general means that a given query point on 
a manifold can be expressed as a sparse "combination" of dictionary elements. Our 
idea here is to embed the manifold into RKHS and replace the idea of "combination" 
on manifolds with the general concept of linear combination in Hilbert spaces. More 
specifically, given a Riemannian dictionary D = {Di,D2, ,DN},Di G Sym%, 
and an embedding function : Sym% — > H, for a Riemannian point X we seek 
for a sparse vector v e R^ such that <I>{X) admits the sparse representation v over 
{(f>(Di), (p{D2), ■ ■ • , (J>(Dm)}- In other words, we are interested in solving the following 
kernelised version of the Lasso problem [1]: 



^ ^iV 

min 



X)-Y,'^^^v^HD^)f + ^Mi) (7) 



The first term in (7) can be expanded as: 



^ — ^i— 1 



|2 



= kiX, X)-2 Y,^_^ v,k{X, D,) + Y^^_^ Y^ _^ v,v,k{Dj,D^) 

= k{X,X)-2v'^K{X,B) + v'^K{B,B)v (8) 

where K = [ai]jvxi; a,i = k{X, Di) and IK = [aij]ivxiv; a-ij = k(Di, Dj). This reveals 
that the optimisation problem in (7) is convex and similar to its counterpart in Euclidean 
space, except for the definition of K and K. Consequently, greedy or relaxation solutions 
can be adapted to obtain the sparse codes [1]. To solve problem (7) we used CVX [21], 
a package for specifying and solving convex programs^. 

3.1 Classification Using Sparse Codes 

There are two main approaches for classification based on the obtained sparse codes 
(vectors) for a given query sample: (i) directly, and (ii) indirectly, with the aid of an 
Euclidean-based classifier. We elucidate the two approaches below. 

(i) If the atoms in sparse dictionary D have associated class labels (ie. each atom 
in the dictionary is a training sample), the sparse codes can be directly used for clas- 
sification. This approach is applicable only to closed-set identification tasks. Let Vi = 
[vi,i5{l{l) — i), Vi,25{l(2) — i), ■■■ , Wi,jv5(/(Af)-z)]^be the class-specific sparse codes, 
where l{j) is the class label of atom Dj and 5{x) is the discrete Dirac function [22]. 
An efficient way of using class-specific sparse codes is through computing residual 
errors [2]. In this case, the residual error of query sample X for class i is defined as: 

e,(X) = UiX) - ^^ v,<l,{D,)5il{j) - ^)f (9) 

I ^ — 'j=i II 

which can be computed via the use of a Riemannian kernel in a similar manner to (8). 
The class with the minimum residual error is deemed to represent the query. Alterna- 
tively, the similarity between query sample X to class i can be defined as Si{X)=hi{v). 
The function hi(v) can be linear like J2iZi '"i^iKj) ~ ^) or ^ven non-linear like 
max(wj(5(/(j)-i)). 

(ii) If the atoms in the sparse dictionary D are not labelled (eg. D is a generic dic- 
tionary not tied to any particular class [23]), the generated sparse codes (vectors) for 
both training and query data can be fed to Euclidean-based classifiers, such as support 
vector machines [22]. The sparse code is hence interpreted as a feature vector, which in 
essence means that a classification problem on a Riemannian manifold is converted to 
an Euclidean classification problem. This approach is appUcable to both closed-set and 
open-set classification tasks. 

4 Learning Riemannian Dictionaries 

If the indirect classification of sparse codes is required (as elucidated in the preceding 
section) a Riemannian dictionary is first required. Given a set of Riemannian points f2 = 



^ The SPAMS package can also be used: http: //spams-devel .gforge . inria. f r/ 



{Xi,X2,--- ,X^}■X^ e Sym%, learning a dictionary D = {Di,£>2,--- ,Z?iv};£>» G 
Sym\ can be formulated as jointly minimising the energy function 



over the dictionary and the sparse codes V — {di, V2, ■ ■ ■ , Wm}; Vi e M'^, ie., min„ ^(J). 

Among the various solutions to the problem of dictionary learning in Euclidean 
spaces, iterative methods like K-SVD have received much attention [1]. Borrowing the 
idea from Euclidean spaces, we propose to minimise the energy in (10) iteratively. After 
initialising the dictionary D, for example by Riemannian clustering using the Karcher 
mean [4], we iterate between a sparse coding step and a dictionary update step. In the 
sparse coding step, D is fixed and V is computed. In the dictionary update step, V is 
fixed while D is updated, with each dictionary atom updated independently. 

The derivative of (10) with respect to Dr, while V and other atoms are fixed, is: 

a^ = E,^,(^-2.,,. ^^^ +E.^^^..^.,. g^^ ) (11) 

As VxS{X,Y) = (X + Y)-'^ - |X-\ (11) can be further simplified to: 

Em ^ — -kN ( 1 1 1 \ 

,=1 E,^i Vj^.v,,,k{D„ Dr) i {D, + D,)" " 2 ^"^ ) ^^^^ 

Since (12) contains linear and non-linear terms of Dr (eg. inverse and kernel terms), 
a closed-form solution for computing its root, ie., Dr, cannot be sought. As such, we 
proposeanalternativesolutionby exploiting previous values of fc(-,i3r) and {Di — Dr)~^ 
in the updating step. More specifically, rearranging (12) and estimating fc(-, D,) as well 
as [Di — Dr) ^ by their previous values, atom Dr at iteration t + 1 is updated using: 

^'*^'' = x^ ( T^m''n^ 9Hy n ^^ (^'*'M + G'*^(r-))" (13) 



where 



F^'\r) = Y^Zi '^^J.rk'-'\X„Dr) {X, + £)(*)) ' (14) 



To avoid the degenerative case (due to numerical inconsistency), atoms are nor- 
malised by their second norms at the end of each iteration. Algorithm 1 assembles all 
the above details into pseudo-code for dictionary learning. 

5 Experiments 

Two sets of experiments-' are presented in this section. In the first set, we evaluate the 
performance of the proposed Riemannian SR (RSR) method (as described in Section 3) 



-^ Matlab/Octave source code is available athttp://itee.uq.edu.au/ -uqmharal 



Algorithm 1 : Dictionary learning over Sym^ using the Stein kernel 
Input: 

- training set X= {-X"i}™^j from the underlying Riemannian manifold, 
where each Xi € Sym'\. is a SPD matrix 

- Stein kernel function k{X, Y), as defined in Eqn. (5) 

- niter, the number of iterations 

Output: 

- Riemannian dictionary D = {Di}^^-^ 

1: hiitialise the dictionary D'^' — {-D;^' } _, by selecting TV samples from X randomly, 
or by clustering X on the manifold [24] 



for t = 1 — >• niter do 

Compute fcC) (X,, Df), 1 < i < m, l<j<N 

Compute fcC) (£)<*' , Z)<"), l<i,j <N 

Solve min ~ 2vjK(Xi,B^*'>) + v'^K(D^'^\B^^^)v + XllvAl,, ^i,X,£ 



6: for r = 1 -> A^ do 

7: Compute G^'\r) = ^7=1 E^i «i,,-«i,.fc<'>(Or', £><") (of' + Oi 

8: Compute F(*'(r) = E'li 2t;,,,fc(*)(Xj, D^'>) (x, + D,'*'' 



10 

11 
12 



±Jr ^ II — temo I 



end for 
end for 



without dictionary learning. Each atom in the dictionary is a training sample. This is 
to contrast RSR to previous state-of-the-art methods on several popular closed-set clas- 
sification tasks. We use the residual error approach for classification, as described in 
Eqn. (9). 

In the second set, the performance of the RSR method is evaluated in conjunction 
with dictionaries learned via three methods: random, Riemannian A:-means, and the pro- 
posed dictionary learning technique (as described in Section 4). 

5.1 Riemannian Sparse Representation 

Synthetic Data. We first consider a multi-class classification problem over Sym% using 
synthetic data. We compared the proposed RSR against Tensor Sparse Coding (TSC) [9] 
and log-Euclidean Sparse Representation (logE-SR) [6,7]. The data used in the exper- 
iments constitutes 512 random samples from 4 classes. Half of the samples were used 
for training and the rest were used as test data. 

To create a Riemannian manifold, samples were generated over a particular tangent 
space and then mapped back to the manifold using the exponential map [12]. The po- 
sitions of tangent spaces were chosen randomly and samples in each class obeyed a 





logE-SR [6,7] 


TSC [9] 


RSR (proposed) 


■a a 


easy 68.08 ± 2.5 
hard 53.67 ± 3.2 


60.04 ±6.8 
50.35 ±4.9 


83.05 ±3.0 

66.72 ±2.7 


3 


easy+hard 6 sec 


11107 sec 


41 sec 



Table 1. Average recognition accuracy (in %) and wall-clock time for the synthetic classification 
tasks using log-Euclidean sparse representation [6,7], tensor sparse coding [9] and the proposed 
RSR approach. Run time is represented by combining the times for the easy and hard tasks. 



normal distribution. By fixing the mean of each class and increasing the class variance 
we created two classification problems: 'easy' and 'hard'. To draw useful statistics, the 
data creation process was repeated 100 times. 

Table 1 shows the average recognition accuracy and the total running time (in sec- 
onds). All algorithms were implemented in Matlab and executed on a 3 GHz Intel CPU. 
In terms of recognition accuracy, RSR obtains superior performance when compared 
with previous state-of-the-art approaches. We note that by increasing the class variance, 
samples from the four classes are intertwined, leading to a decrease in recognition ac- 
curacy. The performance of logE-SR is higher than TSC, which might be due the to fact 
that the generated data can be modelled by Gaussian distribution over tangent space, 
hence favouring the tangent-based solution. 

Focusing on run time. Table 1 suggests that logE-SR has the lowest complexity 
while TSC has the highest. The proposed RSR method is substantially faster than TSC, 
while delivering the highest recognition accuracy. 

Face Recognition. We used the 'b' subset of the FERET dataset [25], which includes 
1400 images from 198 subjects. The images were closely cropped around the face and 
downsampled to 64 x 64. Examples are shown in Figure 1 . 

We performed four tests with various pose angles. Training data was composed of 
images marked 'ba', 'bj' and 'bk' (ie., frontal faces with expression and illumination 
variations). Images with 'bd', 'be', 'bf and 'bg' labels (ie., non-frontal faces) were used 
as test data. For Riemannian-based methods, a 43 x 43 covariance descriptor described 
a face image, using the following features: 

F^,y=[I{x,y), X, y, \Gofi{x,y)\, ■■■, \Go^7{x,y)\, \Gi,oix,y)\, ■■■, \G4,7ix,y)\] 

where I{x,y) is the intensity value at position x,y and Gu,v{x,y) is the response of a 
2D Gabor wavelet [26] centered at x, y with orientation u and scale v: 

Q^ ^(X,y) ^ Al y^ g-^((^-»)'' + («-*)^) /gifc„((^-t)coa(9„) + (!/-^)s«n(e„)) _ g-2'^^^ 

with K = ;7= and 6^ = ^. 





PCA-SR [2] 


GSR [18] 


logE-SR [6,7] 


TSC [9] 


RSR (proposed) 


bg 


26.0 


79.0 


46.5 


44.5 


86.0 


bf 


61.0 


97.0 


91.0 


73.5 


97.5 


be 


55.5 


93.5 


81.0 


73.0 


96.5 


bd 


27.5 


77.0 


34.5 


36.0 


79.5 


average 


42.50 


86.63 


63.25 


56.75 


89.88 



Table 2. Recognition accuracy (in %) for the face recognition task using PCA-SRC [2], Ga- 
bor SR (GSR) [18], log-Euclidean sparse representation (logE-SR) [6,7], Tensor Sparse Coding 
(TSC) [9], and the proposed RSR approach. 



Table 2 shows a comparison of RSR against logE-SR [6,7], TSC [9], and two purely 
Euclidean sparse representations, PCA-SRC [2] and Gabor SR (GSR) [18]. In all cases 
the proposed RSR method obtains the highest accuracy. Furthermore, the proposed ap- 
proach significantly outperforms state-of-the-art Euclidean solutions, especially for test 
images with label 'bg'. 



Texture Classification. We performed a classification task using the Brodatz texture 
dataset [27]. Examples are shown in Fig. 2. We followed the test protocol devised 
in [9] and generated nine test scenarios with various number of classes. This includes 
5-texture ('5c', '5m', '5v', '5v2', '5v3'), 10-texture ('10', 'lOv') and 16-texfiire ('16c', 
'16v') mosaics. To create a Riemannian manifold, each image was first downsampled 
to 256 X 256 and then split into 64 regions of size 32 x 32. The feature vector for any 
pixel / {x, y) is F{x, y)= [-^ (a:, ?/) , | |f ] , | ff | , 1 1 , 1 1] ■ Each region is described by 
a 5 X 5 covariance descriptor of these features. For each test scenario, five covariance 
matrices per class were randomly selected as training data and the rest was used for 
testing. The random selection of training/testing data was repeated 20 times. 

Fig. 3 compares the proposed RSR method against logE-SR [6,7] and TSC [9]. The 
proposed RSR approach obtains the highest recognition accuracy on all test scenarios 
except for the '5c' test, where it has slightly worse performance than TSC. 



Person Re-identification. We used the modified ETHZ dataset [31]. The original 
ETHZ dataset was captured using a moving camera [28], providing a range of variations 
in the appearance of people. The dataset is structured into 3 sequences. Sequence 1 con- 




bd be bf bg bj bk 

Fig. 1. Examples of closely-cropped faces from the FERET 'b' subset. 








Test ID 

FiE 2 ExamDles from Fi8-3- Performance on the Bordatz texture dataset [27] using 
the Brodatz texture log-Euclidean sparse representation (logE-SR) [6,7], Tensor Sparse 
dataset [271 Coding (TSC) [9] and the proposed RSR approach. The black bars 

indicate standard deviations. 




Fig. 4. Examples of pedestrians 
in the ETHZ dataset [28]. 




-RSR 
-RLPP 
-HPE 
SDALF 




Fig. 5. Performance on Sequences 1 and 2 of the ETHZ dataset (left and right panels, respec- 
tively), in terms of Cumulative Matching Characteristic curves. The proposed RSR method is 
compared with Histogram Plus Epitome (HPE) [29], Symmetry-Driven Accumulation of Local 
Features (SDALF) [30] and Riemannian Locality Preserving Projection (RLPP) [10]. 



tains 83 pedestrians (4,857 images). Sequence 2 contains 35 pedestrians (1,936 images), 
and Sequence 3 contains 28 pedestrians (1,762 images). See Fig. 4 for examples. 



We downsampled all images to 64 x 32 pixels. For each subject we randomly se- 
lected 10 images for training and used the rest for testing. Random selection of training 
and testing data was repeated 20 times to obtain reliable statistics. To describe each 
image, the covariance descriptor was computed using the following features; 



^x,y— [ X^ y, Hx^y^ ^x,y^ ^x,y, ^x,y^ ^x,yi ^x,y^ ^x,yi ^x,yi ^i 



x,y . 



where {x,y) is the position of a pixel, while Rx,y, Gx,y and B^^y represent the corre- 
sponding colour information. The gradient and Laplacian for colour C are represented 
by c;,y= [\dC/dx\ , \dC/dy\] and C'^^y= [\d^C/dx-'\ , |a^C/9y^|], respectively. 

We compared the proposed RSR method with several techniques previously used 
for pedestrian detection: Histogram Plus Epitome (HPE) [29], Symmetry-Driven Ac- 
cumulation of Local Features (SDALF) [30], and Riemannian Locality Preserving Pro- 
jection (RLPP) [10]. The performance of logE-SR was below HPE method and is not 
shown. The results for TSC could not be generated in a timely manner, due to the heavy 
computational load of the algorithm. 

Results for Sequence 1 and 2 are shown in Fig. 5, in terms of cumulative matching 
characteristic (CMC) curves. The CMC curve represents the expectation of finding the 
correct match in the top n matches. The proposed method obtains the highest accuracy. 
For Sequence 3 (not shown), very similar performance is obtained by SDALF, RLPP 
and the proposed RSR, with HPE having the lowest performance. 



5.2 Dictionary Learning 

Here we compare the performance of the proposed Riemannian dictionary learning 
technique (as described in Section 4), with the performances of dictionaries obtained 
by random sampling and Riemannian A:-means. We first use synthetic data to show 
that the proposed method obtains a lower representation error in RKHS, followed by 
classification experiments on texture data. 



Synthetic Data. We synthesised 512 Riemannian samples from a set of 32 source 
points in Sym\. The source points can be considered as a form of ground-truth. The 
synthesised samples were then used for dictionary creation by Riemannian ^-means [24] 
and the proposed algorithm. 

To generate each source point, an SPD matrix was created by computing the co- 
variance of 100 random samples of a 5 dimensional normal distribution. The mean and 
variance of the distribution are different for each source point. To synthesise each of 
the 512 Riemannian samples, we uniformly selected T — A source points and combined 
them with random positive weights, where the weights obeyed a normal distribution 
with zero mean and unit variance. 

The performance is measured in terms of representation error in RKHS, ie. Eqn. (10). 
Fig. 6 shows the representation error as the algorithms iterate, with the proposed algo- 
rithm obtaining a lower error than Riemannian ^-means. 




Riemannian k-means 
proposed method 
source points 



12 3 4 5 6 7 

Iteration 



random 



fc-means 



Fig. 6. Representation error of learned 
dictionaries in RKHS, Eqn. (10), for 
synthetic data. The proposed method 
(Section 4) is compared with Rieman- 
nian fc- means [24]. The source points 
can be interpreted as a form of ground- 
truth. 



learning 



;.09±1.5 



53.20 ±1.1 



60.65 ±0.9 



Table 3. Recognition accuracy (in %) for the texture classification task with dictionary learning. 
In all cases the proposed RSR approach was used, coupled with a dictionary generated via three 
separate methods: random dictionary generation, Riemannian ^-means [24], and the proposed 
learning algorithm (Section 4). 



Texture Classification. Here we consider a multi-class classification problem, us- 
ing 111 texture images of the Brodatz texture dataset [27]. From each image we ran- 
domly extracted 50 blocks of size 32 x 32. To train the dictionary, 20 blocks from 
each image were randomly selected, resulting in a dictionary learning problem with 
2200 samples. From the remaining blocks, 20 per image were used as probe data and 
10 as gallery samples. The process of random block creation and dictionary genera- 
tion was repeated twenty times. The average recognition accuracies over probe data 
are reported here. In the same manner as in Section 5.1, we used the feature vector 

to create the covariance, where the first di- 



F(:r,y)=|/(x,y),jf|, Ul 



— — 

mension is the grayscale intensity, and the remaining dimensions capture first and sec- 
ond order gradients. 

We used the proposed RSR approach to obtain the sparse codes, coupled with a 
dictionary generated via three separate methods: random dictionary generation, Rie- 
mannian k-means algorithm [24], and the proposed learning algorithm (Section 4). The 
sparse codes were then classified using a nearest-neighbour classifier 

For the randomly generated dictionary case, the classification rates are averaged 
over 10 runs, with each run using a different random dictionary. For all methods, dic- 
tionaries of size k={8, 16, 24, •■ • , 128} were trained. The best results for each approach 
(ie., the results for the dictionary size that obtained the highest recognition accuracy) 
are reported in Table 3. For the random dictionary, k = 64; for the A:-means dictionary, 
k = 96; for the proposed dictionary learning algorithm, k = 24. The results show that 
the proposed algorithm leads to a considerable gain in accuracy. 



6 Main Findings and Future Directions 

With the aim of addressing sparse representation on Riemannian manifolds, proposed 
to seek the solution through embedding the manifolds into RKHS, with the aid of the 
recently introduced Stein kernel. This led to a relaxed and extended version of the Lasso 
problem [1] on Riemannian manifolds. 

Experiments on several classification tasks (face recognition, texture classification, 
person re-identification) show that the proposed approach achieves notable improve- 
ments in discrimination accuracy, in comparison to state-of-the-art methods such as 
tensor sparse coding, Riemannian locality preserving projection, and symmetry-driven 
accumulation of local features. We conjuncture that this stems from better exploita- 
tion of Riemannian geometry, as the Stein kernel is related to geodesic distances via 
a tight bound. The proposed sparse coding method is also considerably faster than the 
state-of-the-art MAXDET reformulation used by Tensor Sparse Coding [9]. 

We have furthermore proposed an algorithm for learning a Riemannian dictionary, 
closely tied to the Stein kernel. In comparison to Riemannian ^-means [24], the pro- 
posed algorithm obtains a lower representation error in RKHS and leads to improved 
classification accuracies. 

Future directions include using the Stein kernel for solving large margin classifi- 
cation problems on Riemannian manifolds. This translates to designing a machinery 
that maximises a margin on SPD matrices based on Stein divergence, which can be 
considered as an extension of support vector machines [13] to tensor spaces. 
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